## 'data.frame': 113937 obs. of 81 variables:
## $ ListingKey : Factor w/ 113048 levels "00003546482094282EF90E5",..: 7180 7192 6492 6514 6531 6534 6544 6551 6532 6532 ...
## $ ListingNumber : int 193129 1209647 81716 658116 909464 1074836 750899 768193 1023355 1023355 ...
## $ ListingCreationDate : Factor w/ 34808 levels "00:00.0","00:00.1",..: 5435 16500 452 1487 22553 15628 30761 28759 25437 25437 ...
## $ CreditGrade : Factor w/ 9 levels "","A","AA","B",..: 5 1 8 1 1 1 1 1 1 1 ...
## $ Term : int 36 36 36 36 36 60 36 36 36 36 ...
## $ LoanStatus : Factor w/ 12 levels "Cancelled","Chargedoff",..: 3 4 3 4 4 4 4 4 4 4 ...
## $ ClosedDate : Factor w/ 2803 levels "","02:17.6","02:25.0",..: 1214 1 1005 1 1 1 1 1 1 1 ...
## $ BorrowerAPR : num 0.165 0.12 0.283 0.125 0.246 ...
## $ BorrowerRate : num 0.158 0.092 0.275 0.0974 0.2085 ...
## $ LenderYield : num 0.138 0.082 0.24 0.0874 0.1985 ...
## $ EstimatedEffectiveYield : num NA 0.0796 NA 0.0849 0.1832 ...
## $ EstimatedLoss : num NA 0.0249 NA 0.0249 0.0925 ...
## $ EstimatedReturn : num NA 0.0547 NA 0.06 0.0907 ...
## $ ProsperRating..numeric. : int NA 6 NA 6 3 5 2 4 7 7 ...
## $ ProsperRating..Alpha. : Factor w/ 8 levels "","A","AA","B",..: 1 2 1 2 6 4 7 5 3 3 ...
## $ ProsperScore : int NA 7 NA 9 4 10 2 4 9 11 ...
## $ ListingCategory..numeric. : int 0 2 0 16 2 1 1 2 7 7 ...
## $ BorrowerState : Factor w/ 52 levels "","AK","AL","AR",..: 7 7 12 12 25 34 18 6 16 16 ...
## $ Occupation : Factor w/ 68 levels "","Accountant/CPA",..: 37 43 37 52 21 43 50 29 24 24 ...
## $ EmploymentStatus : Factor w/ 9 levels "","Employed",..: 9 2 4 2 2 2 2 2 2 2 ...
## $ EmploymentStatusDuration : int 2 44 NA 113 44 82 172 103 269 269 ...
## $ IsBorrowerHomeowner : logi TRUE FALSE FALSE TRUE TRUE TRUE ...
## $ CurrentlyInGroup : logi TRUE FALSE TRUE FALSE FALSE FALSE ...
## $ GroupKey : Factor w/ 707 levels "","00943382969547936B0C529",..: 1 1 335 1 1 1 1 1 1 1 ...
## $ DateCreditPulled : Factor w/ 100274 levels "00:00.0","00:00.2",..: 94244 85147 2976 27865 74287 54958 61547 65014 55731 55731 ...
## $ CreditScoreRangeLower : int 640 680 480 800 680 740 680 700 820 820 ...
## $ CreditScoreRangeUpper : int 659 699 499 819 699 759 699 719 839 839 ...
## $ FirstRecordedCreditLine : Factor w/ 11586 levels "","1947/8/24 0:00",..: 8389 6700 9012 2312 9583 516 8349 7769 5633 5633 ...
## $ CurrentCreditLines : int 5 14 NA 5 19 21 10 6 17 17 ...
## $ OpenCreditLines : int 4 14 NA 5 19 17 7 6 16 16 ...
## $ TotalCreditLinespast7years : int 12 29 3 29 49 49 20 10 32 32 ...
## $ OpenRevolvingAccounts : int 1 13 0 7 6 13 6 5 12 12 ...
## $ OpenRevolvingMonthlyPayment : int 24 389 0 115 220 1410 214 101 219 219 ...
## $ InquiriesLast6Months : int 3 3 0 0 1 0 0 3 1 1 ...
## $ TotalInquiries : int 3 5 1 1 9 2 0 16 6 6 ...
## $ CurrentDelinquencies : int 2 0 1 4 0 0 0 0 0 0 ...
## $ AmountDelinquent : int 472 0 NA 10056 0 0 0 0 0 0 ...
## $ DelinquenciesLast7Years : int 4 0 0 14 0 0 0 0 0 0 ...
## $ PublicRecordsLast10Years : int 0 1 0 0 0 0 0 1 0 0 ...
## $ PublicRecordsLast12Months : int 0 0 NA 0 0 0 0 0 0 0 ...
## $ RevolvingCreditBalance : int 0 3989 NA 1444 6193 62999 5812 1260 9906 9906 ...
## $ BankcardUtilization : num 0 0.21 NA 0.04 0.81 0.39 0.72 0.13 0.11 0.11 ...
## $ AvailableBankcardCredit : int 1500 10266 NA 30754 695 86509 1929 2181 77696 77696 ...
## $ TotalTrades : int 11 29 NA 26 39 47 16 10 29 29 ...
## $ TradesNeverDelinquent..percentage. : num 0.81 1 NA 0.76 0.95 1 0.68 0.8 1 1 ...
## $ TradesOpenedLast6Months : int 0 2 NA 0 2 0 0 0 1 1 ...
## $ DebtToIncomeRatio : num 0.17 0.18 0.06 0.15 0.26 0.36 0.27 0.24 0.25 0.25 ...
## $ IncomeRange : Factor w/ 8 levels "$0 ","$1-24,999",..: 4 5 7 4 3 3 4 4 4 4 ...
## $ IncomeVerifiable : logi TRUE TRUE TRUE TRUE TRUE TRUE ...
## $ StatedMonthlyIncome : num 3083 6125 2083 2875 9583 ...
## $ LoanKey : Factor w/ 113044 levels "00003683605746079487FF7",..: 100315 69815 46242 70754 71365 86483 91228 5294 881 881 ...
## $ TotalProsperLoans : int NA NA NA NA 1 NA NA NA NA NA ...
## $ TotalProsperPaymentsBilled : int NA NA NA NA 11 NA NA NA NA NA ...
## $ OnTimeProsperPayments : int NA NA NA NA 11 NA NA NA NA NA ...
## $ ProsperPaymentsLessThanOneMonthLate: int NA NA NA NA 0 NA NA NA NA NA ...
## $ ProsperPaymentsOneMonthPlusLate : int NA NA NA NA 0 NA NA NA NA NA ...
## $ ProsperPrincipalBorrowed : num NA NA NA NA 11000 NA NA NA NA NA ...
## $ ProsperPrincipalOutstanding : num NA NA NA NA 9948 ...
## $ ScorexChangeAtTimeOfListing : int NA NA NA NA NA NA NA NA NA NA ...
## $ LoanCurrentDaysDelinquent : int 0 0 0 0 0 0 0 0 0 0 ...
## $ LoanFirstDefaultedCycleNumber : int NA NA NA NA NA NA NA NA NA NA ...
## $ LoanMonthsSinceOrigination : int 78 0 86 16 6 3 11 10 3 3 ...
## $ LoanNumber : int 19141 134815 6466 77296 102670 123257 88353 90051 121268 121268 ...
## $ LoanOriginalAmount : int 9425 10000 3001 10000 15000 15000 3000 10000 10000 10000 ...
## $ LoanOriginationDate : Factor w/ 1873 levels "2005/11/15 0:00",..: 484 1869 254 1366 1814 1648 1705 1722 1639 1639 ...
## $ LoanOriginationQuarter : Factor w/ 33 levels "Q1 2006","Q1 2007",..: 18 8 2 32 24 33 16 16 33 33 ...
## $ MemberKey : Factor w/ 90824 levels "00003397697413387CAF966",..: 10981 10212 33733 54932 19440 48049 60441 40985 26069 26069 ...
## $ MonthlyLoanPayment : num 330 319 123 321 564 ...
## $ LP_CustomerPayments : num 11396 0 4187 5143 2820 ...
## $ LP_CustomerPrincipalPayments : num 9425 0 3001 4091 1563 ...
## $ LP_InterestandFees : num 1971 0 1186 1052 1257 ...
## $ LP_ServiceFees : num -133.2 0 -24.2 -108 -60.3 ...
## $ LP_CollectionFees : num 0 0 0 0 0 0 0 0 0 0 ...
## $ LP_GrossPrincipalLoss : num 0 0 0 0 0 0 0 0 0 0 ...
## $ LP_NetPrincipalLoss : num 0 0 0 0 0 0 0 0 0 0 ...
## $ LP_NonPrincipalRecoverypayments : num 0 0 0 0 0 0 0 0 0 0 ...
## $ PercentFunded : num 1 1 1 1 1 1 1 1 1 1 ...
## $ Recommendations : int 0 0 0 0 0 0 0 0 0 0 ...
## $ InvestmentFromFriendsCount : int 0 0 0 0 0 0 0 0 0 0 ...
## $ InvestmentFromFriendsAmount : num 0 0 0 0 0 0 0 0 0 0 ...
## $ Investors : int 258 1 41 158 20 1 1 1 1 1 ...
## [1] "ListingKey"
## [2] "ListingNumber"
## [3] "ListingCreationDate"
## [4] "CreditGrade"
## [5] "Term"
## [6] "LoanStatus"
## [7] "ClosedDate"
## [8] "BorrowerAPR"
## [9] "BorrowerRate"
## [10] "LenderYield"
## [11] "EstimatedEffectiveYield"
## [12] "EstimatedLoss"
## [13] "EstimatedReturn"
## [14] "ProsperRating..numeric."
## [15] "ProsperRating..Alpha."
## [16] "ProsperScore"
## [17] "ListingCategory..numeric."
## [18] "BorrowerState"
## [19] "Occupation"
## [20] "EmploymentStatus"
## [21] "EmploymentStatusDuration"
## [22] "IsBorrowerHomeowner"
## [23] "CurrentlyInGroup"
## [24] "GroupKey"
## [25] "DateCreditPulled"
## [26] "CreditScoreRangeLower"
## [27] "CreditScoreRangeUpper"
## [28] "FirstRecordedCreditLine"
## [29] "CurrentCreditLines"
## [30] "OpenCreditLines"
## [31] "TotalCreditLinespast7years"
## [32] "OpenRevolvingAccounts"
## [33] "OpenRevolvingMonthlyPayment"
## [34] "InquiriesLast6Months"
## [35] "TotalInquiries"
## [36] "CurrentDelinquencies"
## [37] "AmountDelinquent"
## [38] "DelinquenciesLast7Years"
## [39] "PublicRecordsLast10Years"
## [40] "PublicRecordsLast12Months"
## [41] "RevolvingCreditBalance"
## [42] "BankcardUtilization"
## [43] "AvailableBankcardCredit"
## [44] "TotalTrades"
## [45] "TradesNeverDelinquent..percentage."
## [46] "TradesOpenedLast6Months"
## [47] "DebtToIncomeRatio"
## [48] "IncomeRange"
## [49] "IncomeVerifiable"
## [50] "StatedMonthlyIncome"
## [51] "LoanKey"
## [52] "TotalProsperLoans"
## [53] "TotalProsperPaymentsBilled"
## [54] "OnTimeProsperPayments"
## [55] "ProsperPaymentsLessThanOneMonthLate"
## [56] "ProsperPaymentsOneMonthPlusLate"
## [57] "ProsperPrincipalBorrowed"
## [58] "ProsperPrincipalOutstanding"
## [59] "ScorexChangeAtTimeOfListing"
## [60] "LoanCurrentDaysDelinquent"
## [61] "LoanFirstDefaultedCycleNumber"
## [62] "LoanMonthsSinceOrigination"
## [63] "LoanNumber"
## [64] "LoanOriginalAmount"
## [65] "LoanOriginationDate"
## [66] "LoanOriginationQuarter"
## [67] "MemberKey"
## [68] "MonthlyLoanPayment"
## [69] "LP_CustomerPayments"
## [70] "LP_CustomerPrincipalPayments"
## [71] "LP_InterestandFees"
## [72] "LP_ServiceFees"
## [73] "LP_CollectionFees"
## [74] "LP_GrossPrincipalLoss"
## [75] "LP_NetPrincipalLoss"
## [76] "LP_NonPrincipalRecoverypayments"
## [77] "PercentFunded"
## [78] "Recommendations"
## [79] "InvestmentFromFriendsCount"
## [80] "InvestmentFromFriendsAmount"
## [81] "Investors"
可以知道,所有信用评级的数量分布都差不多。
借款人的贷款年利率(APR)都集中于0.1-0.38之间。
借款人的贷款利率集中于0.05-0.35之间。
贷款人的预期有效收益率集中于0.05-0.3之间。
从图中看到,评级为4的数量最多。
从图中看到,信用等级C的数量最多,其次是B,A,而信用等级越高,偿债能力越强,图中的信用等级低的占的比重较大,所以偿债能力较弱。
从图中看到,历史繁荣数据最多的自定义风险评分是4,6,8
从图中可知,为1的类别(债务整合)所占数量最多。
由图可知,Employed(在职)占最多数量
由图可知,贷款人借款地点数量最多是CA,即California,原因可能和prosper总部在旧金山有关。
由图可知,贷款人职业数量最多的是Others,可能是因为很多人填资料时没如实填写信息。
可以看出,受雇佣状态持续时间数量最多的是40-80,0-40,受雇佣状态持续时间从80-480逐渐递减。
房主和不是房主的数量差不多。
借款人不在组中比在组中数量更多。
(将BankcardUtilization>=1分为High, 0.5<BankcardUtilization<=1分为mid, 0<BankcardUtilization<=0.5分为low。)
由图可知,贷款人信用卡的使用额度占信用卡额度比例较高。
(再将信用等级CreditGrade按“NC”,“HR”,“E”,“D”,“C”,“B”,“A”,“AA”的顺序排列)
(信用等级ProsperRating..Alpha.按“HR”,“E”,“D”,“C”,“B”,“A”,“AA”的顺序排列)
借款人的信用分大部分在620-750范围内。
可以看出,调查时的当前信贷额度最多的区域集中于0-20
过去7年信贷额度数量最多区域集中于0-60
从图可知,循环帐户数量最多是5个,从5个开始呈现递减。
拖欠的信用额度不是很多。
违约情况不是很严重
交易行数从0-20增加,从20-70递减。
由图可知,0-0.5是最多的借款人的债务收入范围,所以筹资者财务状况越差,还款能力较低。
由图可知,贷款人年收入范围最多的是$25000-49999, $50000-74999, $100000+, 说明贷款人大部分有较多资金
可以看到,借出1笔的数量最多
可以看出,4000-5000, 10000, 15000的初始金额数量最多。
可以看出,每月需偿还的贷款数量最多的是200
为贷款提供资金的投资者数量在0-100区间最多,之后递减。
CreditGrade 信用等级,反映的是2009年7月1日前客户的信用级,信用等级越高,其偿债能力越强
BorrowerAPR 贷款的年利率(APR)
BorrowerRate 贷款利率,作为P2P平台资金借贷价格的代理变量,BorrowerRate不包含其他费用,是筹资者付给投资人的报酬
EstimatedEffectiveYield 有效收益等于借款人利率(i)减去服务费率,(ii)减去收取的未收取利息,(iii)加上估计收取的滞纳金。 适用于2009年7月以后发放的贷款。
ProsperRating..numeric. 创建列表时分配的Prosper评级:0 - N / A,1 - HR,2 - E,3 - D,4 - C,5 - B,6 - A,7 - AA。 适用于2009年7月以后发放的贷款。
ProsperRating..Alpha. 信用等级,反映的是2009年7月1日后的信用等级.信用等级越高,其偿债能力越强
ProsperScore 使用历史Prosper数据构建的自定义风险评分。 得分从1-10开始,10分是最好的,或者是最低的风险分数。 适用于2009年7月以后发放的贷款。
ListingCategory 借款人在挂牌时所选择的上市类别:0 -不可用,1 -债务合并,2 -家庭改善,3 -业务,4 -个人贷款,5 -学生使用,6 -自动,7 -其他,8 -婴儿和收养,9 -船,10 -美容程序,11订婚戒指,绿色贷款,13 -家庭开支,14 -大型购买,15 -医疗/牙科,16 -摩托车,17 -右室,18 -税,19 -假期,20 -婚礼贷款
EmploymentStatus 贷款人受雇佣状态(Self-employed、Employed等)
BorrowerState 贷款人借款地点
Occupation 贷款人职业
EmploymentStatusDuration 受雇佣状态持续时间(以月为计算单位)
IsBorrowerHomeowner 如果借款人的信用档案中有抵押贷款或提供确认其为房主的文件,则借款人将被归类为房主
CurrentlyInGroup 指定在列表创建时借款人是否在一个组中。
BankcardUtilization 使用的可用循环信用的百分比
CreditScoreRangeLower 由消费者信用评级机构提供的借款人信用评分范围的下限值
CreditScoreRangeUpper 由消费者信用评级机构提供的借款人信用评分范围的上限值
LoanOriginationDate 贷款发起的日期
CurrentCreditLines 信贷资料被调查时的当前信贷额度
TotalCreditLinespast7years 信贷资料在过去七年中的信用额度
OpenRevolvingAccounts 开立的循环账户数量
CurrentDelinquencies 当前拖欠额度
DelinquenciesLast7Years 信用资料提交时借款人过去7年违约次数,该指标在一定程度上可以体现借款标发布者的信用状况
TotalTrades 开设的交易行数
DebtToIncomeRatio 借款人的债务收入比,债务收入比越高说明筹资者财务状况越差,还款能力较低.其向P2P平台借款时,投资者应要求有更高的回报。
IncomeRange 贷款人年收入范围
TotalProsperLoans Prosper在创建此列表时借给他人的贷款数量
LoanOriginalAmount 贷款的初始金额
MonthlyLoanPayment 预定的每月贷款支付
Investors 为贷款提供资金的投资者的数量
在单变量绘图中,通过画图可视化探讨了以下问题:贷款人来自什么地区,他们的收入范围,职业是什么?他们的信用评级,信用卡使用情况,贷款人的债务收入,为贷款提供资金的投资者数量情况等。
主要用到的数据特征有: ProsperRating..numeric,ProsperRating..Alpha, BankcardUtilization, MonthlyLoanPayment, LoanOriginalAmount, Investors, IncomeRange等。
由于收入范围IncomeRange的横坐标字重叠了,所以调整了图形,将横坐标的字设置为垂直;有些变量例如TotalCreditLinespast7years,TotalTrades, 缩小了横坐标的范围,能更清楚地看到这些变量的变化情况;从BankcardUtilization变量中分出了高,中,低三个组,能更清晰直观地看到贷款人信用卡的使用额度占信用卡额度比例情况; 以2009-07-01为分界线,来看信用评级情况,结果也更加明显。
##
## Pearson's product-moment correlation
##
## data: pf$CreditRangeHighLow and pf$TotalTrades
## t = 46.159, df = 106390, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.1342225 0.1460043
## sample estimates:
## cor
## 0.1401184
由此可以看出,TotalTrades和CreditRangeHighLow呈缓慢正向相关,没有很明显的关联,说明评分级别和交易行数的关系不确定。
##
## Pearson's product-moment correlation
##
## data: pf$CreditRangeHighLow and pf$TradesNeverDelinquent..percentage.
## t = 173.16, df = 106390, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.4641980 0.4735735
## sample estimates:
## cor
## 0.468899
可以看出,信用评级越高,就越不可能出现拖欠交易。
##
## Pearson's product-moment correlation
##
## data: pf$CreditRangeHighLow and pf$EstimatedEffectiveYield
## t = -144.8, df = 84851, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.4505058 -0.4397151
## sample estimates:
## cor
## -0.4451266
CreditRangeHighLow和EstimatedEffectiveYield存在-0.445的负相关关联。
##
## Pearson's product-moment correlation
##
## data: pf$CreditRangeHighLow and pf$BorrowerRate
## t = -175.17, df = 113340, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.4661358 -0.4569730
## sample estimates:
## cor
## -0.4615667
对于BorrowerRate和CreditRangeHighLow来说,它有-0.46的负相关关联,信用等级越高的人更有可能以较低利率借到贷款。
(IncomeRange按“Not displayed”, “Not employed”, “$0”, “$1-24,999”, “$25,000-49,999”, “$50,000-74,999”, “75,000-99,999”, "$100,000+ 的顺序排列)
随着工资的增长,贷款金额也增长了。
由图可以看出,还款期限36对应了最高的拖欠天数,但无法直观地看出Term和LoanCurrentDaysDelinquent之间的关系。
从图中没看出每月预付贷款和贷款金额之间的关系。
从图中可以知道,每月预付贷款和贷款发起后的月数没有什么关联。
可以看出,过去7年信用额度越多,开立的循环账户数量越多。
可以看出,投资者的数量和开立的循环账户数量没有明显的关联。
从双变量分析图中可以知道贷款利率和信用评级之间存在负相关,收入的提升可以借到更多的贷款,还款期限也可以随着贷款的增加而增加,信用额度越多,能开立的循环账户数量也越多。
从图中可以看出,在2009-7-1号之前,信用评级从480开始,从未拖欠交易数量从0开始逐渐上升,而在2009-7-1号以后,信用评级从600开始,从未拖欠交易数量从0.75开始逐渐上升。
(以2009-07-01为分界线)
信用等级和从未拖欠交易数量折线由明显上升变为了平缓不变,说明prosper对于拖欠数量和信用等级加强了管理。
从图中可以发现,信用评分700-750占了大部分区域。
##
## Calls:
## m1: lm(formula = I(TradesNeverDelinquent..percentage.) ~ I(CreditRangeHighLow),
## data = subset(pf, LoanOriginationDate < "2009-07-01" & !is.na(CreditGrade) &
## !is.na(LoanOriginalAmount) & !is.na(Bankcard) & !is.na(CreditRangeHighLow)))
## m2: lm(formula = I(TradesNeverDelinquent..percentage.) ~ I(CreditRangeHighLow) +
## CreditGrade, data = subset(pf, LoanOriginationDate < "2009-07-01" &
## !is.na(CreditGrade) & !is.na(LoanOriginalAmount) & !is.na(Bankcard) &
## !is.na(CreditRangeHighLow)))
## m3: lm(formula = I(TradesNeverDelinquent..percentage.) ~ I(CreditRangeHighLow) +
## CreditGrade + LoanOriginalAmount, data = subset(pf, LoanOriginationDate <
## "2009-07-01" & !is.na(CreditGrade) & !is.na(LoanOriginalAmount) &
## !is.na(Bankcard) & !is.na(CreditRangeHighLow)))
## m4: lm(formula = I(TradesNeverDelinquent..percentage.) ~ I(CreditRangeHighLow) +
## CreditGrade + LoanOriginalAmount + Bankcard, data = subset(pf,
## LoanOriginationDate < "2009-07-01" & !is.na(CreditGrade) &
## !is.na(LoanOriginalAmount) & !is.na(Bankcard) & !is.na(CreditRangeHighLow)))
##
## =====================================================================================
## m1 m2 m3 m4
## -------------------------------------------------------------------------------------
## (Intercept) -0.163*** 0.100 0.078 -0.145*
## (0.011) (0.065) (0.065) (0.062)
## I(CreditRangeHighLow) 0.001*** 0.001*** 0.001*** 0.001***
## (0.000) (0.000) (0.000) (0.000)
## CreditGrade: AA -0.022*** -0.023*** -0.014*
## (0.007) (0.007) (0.006)
## CreditGrade: B 0.001 0.003 -0.002
## (0.006) (0.006) (0.005)
## CreditGrade: C -0.024** -0.018* -0.022**
## (0.008) (0.008) (0.008)
## CreditGrade: D -0.013 -0.004 -0.008
## (0.011) (0.011) (0.011)
## CreditGrade: E -0.055*** -0.044** -0.038**
## (0.015) (0.015) (0.014)
## CreditGrade: HR -0.153*** -0.140*** -0.104***
## (0.018) (0.018) (0.018)
## LoanOriginalAmount 0.000*** 0.000*
## (0.000) (0.000)
## Bankcard: High/0 0.109***
## (0.005)
## Bankcard: low/0 0.098***
## (0.004)
## Bankcard: mid/0 0.158***
## (0.004)
## -------------------------------------------------------------------------------------
## R-squared 0.282 0.306 0.308 0.369
## adj. R-squared 0.282 0.306 0.307 0.368
## sigma 0.175 0.172 0.172 0.164
## F 8384.655 1346.258 1185.582 1132.477
## p 0.000 0.000 0.000 0.000
## Log-likelihood 6943.431 7311.064 7332.310 8315.605
## Deviance 652.241 630.160 628.907 573.563
## AIC -13880.863 -14604.128 -14644.620 -16605.209
## BIC -13856.957 -14532.409 -14564.933 -16501.616
## N 21349 21349 21349 21349
## =====================================================================================
##
## Calls:
## m5: lm(formula = I(TradesNeverDelinquent..percentage.) ~ I(CreditRangeHighLow),
## data = subset(pf, LoanOriginationDate > "2009-07-01" & !is.na(ProsperRating..Alpha.) &
## !is.na(LoanOriginalAmount) & !is.na(Bankcard) & !is.na(CreditRangeHighLow)))
## m6: lm(formula = I(TradesNeverDelinquent..percentage.) ~ I(CreditRangeHighLow) +
## ProsperRating..Alpha., data = subset(pf, LoanOriginationDate >
## "2009-07-01" & !is.na(ProsperRating..Alpha.) & !is.na(LoanOriginalAmount) &
## !is.na(Bankcard) & !is.na(CreditRangeHighLow)))
## m7: lm(formula = I(TradesNeverDelinquent..percentage.) ~ I(CreditRangeHighLow) +
## ProsperRating..Alpha. + LoanOriginalAmount, data = subset(pf,
## LoanOriginationDate > "2009-07-01" & !is.na(ProsperRating..Alpha.) &
## !is.na(LoanOriginalAmount) & !is.na(Bankcard) & !is.na(CreditRangeHighLow)))
## m8: lm(formula = I(TradesNeverDelinquent..percentage.) ~ I(CreditRangeHighLow) +
## ProsperRating..Alpha. + LoanOriginalAmount + Bankcard, data = subset(pf,
## LoanOriginationDate > "2009-07-01" & !is.na(ProsperRating..Alpha.) &
## !is.na(LoanOriginalAmount) & !is.na(Bankcard) & !is.na(CreditRangeHighLow)))
##
## =============================================================================================
## m5 m6 m7 m8
## ---------------------------------------------------------------------------------------------
## (Intercept) 0.268*** 0.352*** 0.357*** 0.149***
## (0.006) (0.007) (0.007) (0.008)
## I(CreditRangeHighLow) 0.001*** 0.001*** 0.001*** 0.001***
## (0.000) (0.000) (0.000) (0.000)
## ProsperRating..Alpha.: .L 0.033*** 0.018*** 0.021***
## (0.002) (0.002) (0.002)
## ProsperRating..Alpha.: .Q -0.008*** -0.001 0.004**
## (0.001) (0.001) (0.001)
## ProsperRating..Alpha.: .C -0.009*** -0.006*** -0.005***
## (0.001) (0.001) (0.001)
## ProsperRating..Alpha.: ^4 0.002* 0.000 0.000
## (0.001) (0.001) (0.001)
## ProsperRating..Alpha.: ^5 0.001 0.000 -0.000
## (0.001) (0.001) (0.001)
## ProsperRating..Alpha.: ^6 -0.005*** -0.004*** -0.003***
## (0.001) (0.001) (0.001)
## LoanOriginalAmount 0.000*** 0.000***
## (0.000) (0.000)
## Bankcard: High/0 0.088***
## (0.004)
## Bankcard: low/0 0.046***
## (0.002)
## Bankcard: mid/0 0.091***
## (0.002)
## ---------------------------------------------------------------------------------------------
## R-squared 0.122 0.130 0.140 0.181
## adj. R-squared 0.122 0.130 0.140 0.181
## sigma 0.114 0.113 0.113 0.110
## F 11797.459 1818.741 1728.985 1707.486
## p 0.000 0.000 0.000 0.000
## Log-likelihood 64081.687 64489.996 64965.972 67043.067
## Deviance 1097.052 1086.545 1074.423 1023.089
## AIC -128157.374 -128961.992 -129911.944 -134060.133
## BIC -128129.328 -128877.854 -129818.457 -133938.601
## N 84853 84853 84853 84853
## =============================================================================================
最后的图和总结
##
## Pearson's product-moment correlation
##
## data: pf$CreditRangeHighLow and pf$TradesNeverDelinquent..percentage.
## t = 173.16, df = 106390, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.4641980 0.4735735
## sample estimates:
## cor
## 0.468899
由信用评分和从未拖欠交易数量正相关关系来看,很可能信用评分和从未拖欠交易数量相互影响。
从2009-7-1年前后来看,2009-7-1号之后从信用评分600开始,从未拖欠交易数量比例从0.75开始上升,而在2009-7-1号之前信用评分从480开始,从未拖欠交易数量比例从0开始上升,很可能props 儿对信用评分和从未拖欠交易数量比例加强了管理。
可以看出,在2009年前后,信用等级和从未拖欠交易数量折线由逐渐上升变为了平缓不变,说明prosper对于拖欠数量和信用等级加强了管理。
#不足
这个数据集数据变量较多,只能选取一些重要和想探索的进行分析。在单变量作图中,选取了30个变量画图;而在双变量,对还款期限,工资范围,贷款金额,拖欠天数等进行画图分析,在多变量画图中,以2009为未分界点,探索了信用评分,从未拖欠交易数量,借款金额,信用等级等变量;还有很多变量没有探讨和分析,后期可以通过机器学习等方法继续进行探索。
https://s3.amazonaws.com/udacity-hosted-downloads/ud651/AtlanticHurricaneTracking.html
https://s3.amazonaws.com/udacity-hosted-downloads/ud651/GeographyOfAmericanMusic.html